AITopics | marginal variance

Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce - the entropy increase at a query when removed. This criterion credits examples for resolving predictive uncertainty rather than label noise. To scale to modern networks, we approximate information loss using a Gaussian Process surrogate built from tangent features. We show this aligns with classical influence scores for single-example attribution while promoting diversity for subsets. For even larger-scale retrieval, we relax to an information-gain objective and add a variance correction for scalable attribution in vector databases. Experiments show competitive performance on counterfactual sensitivity, ground-truth retrieval and coreset selection, showing that our method scales to modern architectures while bridging principled measures with practice.

abayesian information-theoretic approach, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

2604.03858

Country: Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

BewareoftheSimulatedDAG! CausalDiscoveryBenchmarksMayBeEasyToGame

Neural Information Processing SystemsFeb-11-2026, 17:17:56 GMT

Here, we show that marginal variance tends to increase along the causal order for generically sampled additivenoise models.

artificial intelligence, machine learning, variance, (17 more...)

Neural Information Processing Systems

Country: Europe > Denmark (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

e987eff4a7c7b7e580d659feb6f60c1a-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 17:17:53 GMT

algorithm, variance, varsortability, (14 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > Limburg > Maastricht (0.04)
North America > Greenland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy to Game

Neural Information Processing SystemsDec-25-2025, 03:51:50 GMT

Simulated DAG models may exhibit properties that, perhaps inadvertently, render their structure identifiable and unexpectedly affect structure learning algorithms. Here, we show that marginal variance tends to increase along the causal order for generically sampled additive noise models. We introduce varsortability as a measure of the agreement between the order of increasing marginal variance and the causal order. For commonly sampled graphs and model parameters, we show that the remarkable performance of some continuous structure learning algorithms can be explained by high varsortability and matched by a simple baseline method. Yet, this performance may not transfer to real-world data where varsortability may be moderate or dependent on the choice of measurement scales. On standardized data, the same algorithms fail to identify the ground-truth DAG or its Markov equivalence class. While standardization removes the pattern in marginal variance, we show that data generating processes that incur high varsortability also leave a distinct covariance pattern that may be exploited even after standardization. Our findings challenge the significance of generic benchmarks with independently drawn parameters.

causal discovery, name change, simulated dag, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

e987eff4a7c7b7e580d659feb6f60c1a-Supplemental.pdf

Neural Information Processing SystemsAug-18-2025, 10:58:39 GMT

Here, we show that marginal variance tends to increase along the causal order for generically sampled additive noise models.

artificial intelligence, machine learning, variance, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Netherlands > Limburg > Maastricht (0.04)
North America > Greenland (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Add feedback

e987eff4a7c7b7e580d659feb6f60c1a-Paper.pdf

Neural Information Processing SystemsAug-18-2025, 10:58:36 GMT

Here, we show that marginal variance tends to increase along the causal order for generically sampled additive noise models.

artificial intelligence, machine learning, variance, (17 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > Limburg > Maastricht (0.04)
North America > Greenland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy to Game

Neural Information Processing SystemsJan-19-2025, 11:47:23 GMT

Simulated DAG models may exhibit properties that, perhaps inadvertently, render their structure identifiable and unexpectedly affect structure learning algorithms. Here, we show that marginal variance tends to increase along the causal order for generically sampled additive noise models. We introduce varsortability as a measure of the agreement between the order of increasing marginal variance and the causal order. For commonly sampled graphs and model parameters, we show that the remarkable performance of some continuous structure learning algorithms can be explained by high varsortability and matched by a simple baseline method. Yet, this performance may not transfer to real-world data where varsortability may be moderate or dependent on the choice of measurement scales.

causal discovery, marginal variance, simulated dag, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Standardizing Structural Causal Models

Ormaniec, Weronika, Sussex, Scott, Lorch, Lars, Schölkopf, Bernhard, Krause, Andreas

arXiv.org Machine LearningJun-17-2024

Synthetic datasets generated by structural causal models (SCMs) are commonly used for benchmarking causal structure learning algorithms. However, the variances and pairwise correlations in SCM data tend to increase along the causal ordering. Several popular algorithms exploit these artifacts, possibly leading to conclusions that do not generalize to real-world settings. Existing metrics like $\operatorname{Var}$-sortability and $\operatorname{R^2}$-sortability quantify these patterns, but they do not provide tools to remedy them. To address this, we propose internally-standardized structural causal models (iSCMs), a modification of SCMs that introduces a standardization operation at each variable during the generative process. By construction, iSCMs are not $\operatorname{Var}$-sortable, and as we show experimentally, not $\operatorname{R^2}$-sortable either for commonly-used graph families. Moreover, contrary to the post-hoc standardization of data generated by standard SCMs, we prove that linear iSCMs are less identifiable from prior knowledge on the weights and do not collapse to deterministic relationships in large systems, which may make iSCMs a useful model in causal inference beyond the benchmarking problem studied here.

iscm, scm, variance, (16 more...)

arXiv.org Machine Learning

2406.11601

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.80)

Add feedback

An Ordering of Divergences for Variational Inference with Factorized Gaussian Approximations

Margossian, Charles C., Pillaud-Vivien, Loucas, Saul, Lawrence K.

arXiv.org Machine LearningMar-20-2024

Given an intractable distribution $p$, the problem of variational inference (VI) is to compute the best approximation $q$ from some more tractable family $\mathcal{Q}$. Most commonly the approximation is found by minimizing a Kullback-Leibler (KL) divergence. However, there exist other valid choices of divergences, and when $\mathcal{Q}$ does not contain~$p$, each divergence champions a different solution. We analyze how the choice of divergence affects the outcome of VI when a Gaussian with a dense covariance matrix is approximated by a Gaussian with a diagonal covariance matrix. In this setting we show that different divergences can be \textit{ordered} by the amount that their variational approximations misestimate various measures of uncertainty, such as the variance, precision, and entropy. We also derive an impossibility theorem showing that no two of these measures can be simultaneously matched by a factorized approximation; hence, the choice of divergence informs which measure, if any, is correctly estimated. Our analysis covers the KL divergence, the R\'enyi divergences, and a score-based divergence that compares $\nabla\log p$ and $\nabla\log q$. We empirically evaluate whether these orderings hold when VI is used to approximate non-Gaussian distributions.

divergence, marginal variance, variance, (16 more...)

arXiv.org Machine Learning

2403.13748

Country:

Asia > Middle East > Jordan (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

Watanabe, Shuhei, Bansal, Archit, Hutter, Frank

arXiv.org Artificial IntelligenceMay-26-2023

The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this issue, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form calculation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.10255

Country: Europe > Germany > Baden-Württemberg > Freiburg (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback